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SPECTRAL  DISTORTION  MEASURES  FOR  SPEECH  CCWPRESSION 


Y.  Matsuyama,  A.  Buzo,  and  R.M.  Gray 


ABSTRACT 


In  recent  years  several  measures  of  distortion  between  speech 


wavefonns  have  been  proposed  as  substitutes  for  the  traditional  but 


subjectively  inadequate  mean-squared  error.  All  of  these  measures 


involve  some  form  of  distortion  measure  between  the  second  order  proper 


ties  of  the  speech  processes  producing  the  waveforms  instead  of  an  average 


of  the  waveform  error  power.  In  particular,  they  depend  on  the  power 


spectral  densities  or  linear  models  of  the  speech  process.  In  this  report 


the  properties  and  interrelations  of  several  such  measures  are  developed 


In  particular,  the  relative  strengths  or  equivalences  of  the  various 


implications  and  applications  of  these  measures  to  prediction,  detection 


and  coding  are  summarized 
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Air  Force  Office  of  Scientific  Research  under  Contract  F44620-73-C-0065 
and  by  the  Joint  Services  Electronics  Program  at  Stanford  under  contract 
N00014-75-C-0601. 


1. 


INTRODUCTION 


Any  communications  system  for  human  speech  has  a nctu/al  fidelity 
criterion  — the  subjective  fidelity  for  a given  cuat<iaM»r,  that  ‘a,  whether 
or  not  the  final  reconstructed  speech  sounds  **g«iod*'  or  "bad."  For 
several  reasons,  however,  it  is  desirable  to  have  a mathematical  fidelity 
criterion  — a formula  for  computing  a number  frtm  tw«»  speei 4 waveforms 
that  measures  the  "distortion"  or  "badness  of  approximation”  between 
them.  Such  a mathematical  criterion  provides  an  absolute  yardstick 
independent  of  individual  listeners'  differences  of  taste  and  may  allow 
the  theoretical  analysis  of  such  systems,  e.g. , the  application  of 
communications  theory  to  develop  "optimal"  performance  bounds  with  which 
to  compare  actual  system  performance.  In  addition,  a distortion  measure 
can  play  a crucial  role  in  the  actual  operation  of  the  communication 
systems  or,  for  example,  in  coding  or  generalized  quantization  systems 
where  one  selects  a reproduction  symbol  from  an  allowed  set  by  choosing 
the  one  having  minimum  distortion  from  the  given  input  symbol. 

To  be  useful,  any  such  criterion  must  possess  to  some  degree  the 
following  attributes:  (1)  It  should  be  subjectively  meaningful,  that  is, 
large  (small)  distortion  should  correspond  to  bad  (good)  subjective  quality. 
(2)  It  should  be  mathematically  tractable  so  as  to  allow  theoretical 
analyses.  (3)  It  should  be  computable  so  that  the  distortions  resulting 
in  an  actual  system  can  be  determined.  Historically  the  mean-squared 
error  between  waveforms  has  been  greatly  used  because  it  met  attributes 
(2)  and  (3),  but  it  has  a major  drawback  of  not  being  sufficiently 
subjectively  meaningful  — especially  in  systems  such  as  Linear  Predic- 
tive Coded  (LPC)  systems.  An  Intuitive  explanation  for  the  subjective 
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Inadequacy  of  mean-squared  error  la  that  an  ear  needs  to  only  recognize 
the  random  process  producing  the  waveform  to  within  some  accuracy  and 
does  not  need  to  accurately  reproduce  the  specific  waveform  itself,  e.g. , 
a "shh"  sound  is  essentially  a white  noise  process  and  any  waveform 
"typical"  or  "representative"  or  "generic"  of  this  process  (in  the  sense 
of  the  ergodic  theorem)  will  sound  the  same,  even  though  such  waveforms 
may  differ  drastically  in  individual  appearance  and  hence  in  mean- 
squared  error.  Demanding  a small  mean-squared  error  in  a speech  com- 
pression system  will  therefore  often  require  far  more  bits  and  much  more 
accuracy  than  the  human  ear  requires. 

The  highly  successful  LPC  systems,  however,  model  speech  as  a 
composite  or  "switched"  source  formed  by  outputting  segments  of  stationary 
and  ergodic  subsources  for  intervals  of  time  that  are  long  enough  for 
an  observer  to  estimate  the  process  (or  model  of  a process)  being 
observed  and  then  to  transmit  a description  of  the  process  rather  than 
the  actual  observed  waveform.  To  measure  the  distortion  of  such  a system 
one  is  naturally  led  to  a distortion  measure  that  measures  the  closeness 
of  the  original  and  reproduced  processes  or  models  rather  than  the  actual 
waveforms,  e.g.  , between  the  power  spectral  densities  or  related  quantities. 

To  compute  the  distortion  one  must  view  the  actual  waveforms  and 
estimate  such  power  spectral  densities  via  time-average  correlation  and 
Fourier  transforms,  spectrum  analyzers,  or  appropriately  "smoothed" 
estimates.  For  a discussion  of  estimating  spectra  from  waveforms  see, 
e.g.,  Brlllinger  [ ^ ].  Here  it  is  assumed  that  the  time  windows  are  long 
enough  for  the  ergodic  theorem  to  ensure  that  these  sample  averages 
nearly  equal  their  expectations,  the  "true"  power  spectral  densities 


i j 


l 

I 

i 


f 


3 


Involved.  Thus,  such  distortion  measures  can  be  viewed  as  measures  of 
the  distortion  between  power  spectral  densities  or  related  second  order 
properties  of  two  processes  rather  than  distortion  between  waveforms. 

A general  discussion  and  motivation  along  with  relevant  references  for 
such  distortion  measures  may  be  found  in  Gray  and  Markel  [2j.  and  related 
discussions  may  be  found  in  Viswanathan,  et.  al.  [s]  and  Makhoul  [4], 

In  addition,  some  of  these  distortion  measures  on  spectra  or 
models  have  proved  amenable  to  analysis,  permitting  the  development  of 
subjectively  meaningful  mathematical  bounds  on  optimal  performance  in  LPC 
coded  speech  with  single-symbol  quantization  of  the  reflection  coeffi- 
cients [5,6].  These  preliminary  results  suggest  that  more  general 
techniques  from  information  and  communications  theory  may  be  applicable 
to  obtain  useful  performance  bounds  on  more  general  speech  communication 
systems,  for  example,  LPC  systems  followed  by  data  compression  systems 
with  memory. 

Many  of  these  measures  appear  quite  different,  possess  different 
properties,  and  have  proved  useful  for  different  applications.  In  [ 2j , 
Gray  and  Markel  develop  some  properties  and  interrelations  and  dis- 
cuss the  application  of  some  of  these  measures.  In  this  paper  we  expand 
their  work  by  developing  more  of  the  properties  and  interrelations  of 
their  distortion  measures  and  some  other  related  distortion  measures. 

Of  particular  interest  is  the  question  of  when  a class  of  distortion 
measures  is  "equivalent"  in  the  sense  that  good  (bad)  performance  under 
one  class  measure  means  good  (bad)  performance  under  any  other.  This 
implies  that  if  one  member  of  the  class  is  subjectively  meaningful, 
then  so  are  all  of  the  others  and  hence  a designer  can  select  a 
distortion  measure  from  the  class  on  the  basis  of  tractabillty  or 
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computational  efficiency  for  a particular  problem.  A second  goal  of 


this  paper  is  to  provide  some  interpretations  and  implication  of  these 
distances  for  coding,  prediction,  and  detection  theory  that  were  not 
given  in  [2J.  These  properties  help  to  provide  some  mathematical  intui- 
tion as  to  why  these  measures  are  subjectively  useful. 
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2.  PRELIMINARIES 


Lp  Spaces  (see,  e.g. , Ash  [7  J,  Ch.  2) 


Let  denote  the  space  of  all  measurable  complex-valued  functions 


f on  [-«,«]•  For  p & 1 define  L^  as  the  subspace  of  TTl  con- 


taining all  f for  which 


IfIL  {(2rt)  " J-  |f(e)|‘'de} 


< “> 


-n 


that  is,  the  integral  exists  and  is  finite.  If  we  consider  f and  g 
to  be  equal  if 

de  = 0 


Jrfcel 


e:f(e^!<Eg(e) 


(f=g  almost  everywhere) , then  is  a normed  linear  space  with 


norm  Irllp*  ^orms  are  successively  stronger  in  the  sense  that 


l^llp  ^ ll^ilq 


if  q & p . (2.1) 

Important  inequalities  are  the  Minkowski  or  triangle  inequality 


|f+gllp  ^ l|f|lp+lkllp 


the  Implied  inequality 


|f-g|lp  ^ 


(2.2) 


Holder's  inequality 


IlfKlIl  ^ IlfilpIlKilq  . (1/P>  + = 1. 

i. 

and  its  special  case  the  Cauchy-Schwartz  Inequality 


IfKill  ^ I|f|l2ll8|l2 
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Random  Processes  (Grenader  and  SzegH  f 8 I . Ch.  10:  Doob  fO  1.  Ch.  10  & 12) 


00 

Let  rx  } be  a real-valued  zero  mean  wide-sense  stationary 

discrete-time  random  process.  We  consider  discrete  time  both  for  simplicity 
and  to  be  consistent  with  the  speech  literature  which  focuses  on  sampled 
waveforms  (assumed  sampled  at  a sufficiently  high  rate  so  that  negligible 
distortion  occurs).  Define  the  autocorrelation  function  r(k)  = 


E X X , , where  E denotes  expectation,  and  assume  that 
n n-k’  ’ 


Z1  |r(n)|  < 00 


(2.3) 


so  that  the  power  spectral  density 


f(0)  = E r(n)« 


, 0 € [-«,«] 


is  well-defined,  continuous,  even,  and  f e (in  fact,  (2.3)  Implies  that 
f is  bounded).  Furthermore, 

r(n)  = (2n)"  J e^"®f(0)d0  . (2.4) 

-fl 

When  we  begin  with  a spectral  density  and  form  r as  above,  we  will 
often  denote  it  by  r^(n) . 

By  standard  arguments  from  the  theory  of  random  processes  we  can 

go  the  other  way,  that  is,  given  a nonnegative  real-valued  even  f e L^^ 

there  exists  a random  process  (X^)  having  f as  spectral  density  and 

r(n)  of  (2.4)  as  an  autocorrelation  function.  Thus  we  can  define  the 

space  S of  all  power  spectral  densities  as  the  collection  of  all 

real -valued,  nonnegative,  even  f e L^. 

A process  {X  1 is  said  to  be  white  if  EX  =0  and  r(n)  = r(0)5 
n n n 

(6^  = 1 for  n = 0,  0 otherwise)  in  which  case  f(0)  = r(0) , 

6 € [-«.«]• 
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Linear  Filters 


A (causal  and  time-invariant)  linear  filter  is  described  by  a 

6-response  (response  to  a Kronekker  6 ) k=0,l,...  If  the  filter 

input  is  fX  ] , then  the  output  process  fY  } is  described  by  the  discrete 
n n 

convolution 


= S h X t 
k=o 


where  the  sum  exists  as  a limit  in  the  mean  if 
“ 2 

Etc  < CO 
k=0  ^ 


If 


00 


the  filter  is  said  to  be  stable.  A filter  is  also  described  by  its 
transfer  function  H(e  ) where 

H(z)  = S h z . 
k=0 

We  define  h^  as  the  gain  of  the  filter.  If  h^  = 1,  the  filter  is 
called  monlc.  Any  filter  can  be  written  as  the  cascade  of  a monic 
filter  and  a gain.  Both  H and  h will  be  referred  to  as  filters. 
Given  two  filters  h and  g,  the  cascade  filter  d (or  D)  is 
defined  by  the  convolution 

'’n  = ?„Vn-k 
k=0 

or  D(z)  = H(z)G(z) . Note  that  since  the  filters  are  causal,  we  have 
that 

do  = hogo  (2.5) 
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If  g is  such  that  = 6^^  (or,  equivalently,  H(z)G(z)  = 1)  we  say 

that  g is  the  inverse  filter  of  h (or  vice-versa).  Note  that 
if  h and  g are  inverse  filters,  then  (2.5)  implies  that 


‘*0^0  = 1 


(2.6) 


If  a random  process  (X^}  with  spectral  density  f is  input  to 

a filter  H,  then  the  output  process  has  power  spectral  density 
10  2 

f(6)|H(e  )(  . In  particular,  if  the  input  process  is  white  with 

2 

f(6)  = r(0)  = Q , then  the  output  process  has  power  spectral  density 
2 10  2 

(j  |H(e  )[  . The  spectral  factorization  theorem  states  that  all  non- 

deterministic  processes  have  a second  order  model  of  this  form  with 
H monic.  A process  (X^)  with  spectral  density  f is  nondeterministic 


J in  f(e)d0 


> 


(2.7) 


To  be  precise,  the  spectral  factorization  theorem  states  that 
nondeterministic  if  and  only  if  the  spectral  density  f has  the 


following  form: 


where 


f(e)  = |f'^(e)| 


f'^(e)  = CT^B(e^®) 


B(z)  = b.z  ^ 0,  I z|  > 1 

k=0 


‘’O  = 1 

^ I\l"  < 

k=0 


= exp((2rt)  ^ J fn  f(e)de) 

-It 
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that  is,  B(z)  is  analytic  in  the  open  unit  circle,  B is  a monic 

filter,  and  is  a causal  filter  with  gain  Intuitively,  a 

process  is  nondeten.iinistic  if  and  only  if  it  can  be  represented  to  second 

order  by  a one-sided  or  causal  moving  average 

00 

LbZ  . 

n f , - k n-k 

k=0 

2 

where  [Z  ) is  white  with  E Z =1.  The  white  process  {ct.Z  ) is 
n n in 

called  the  innovations  process  of  {X  }.  In  other  words,  a white 

n 

process  which  drives  a causal  monic  filter  to  produce  the 

Innovations  of  (X  ). 

n 

We  make  for  later  convenience  the  assumption  that  1/f  e 
This  implies  (2.7)  from  Jensen^s  inequality. 

By  assumption  f € and  hence  from  Jensen^s  inequality  we 


have 


s (2n)~^  J f(e)de  = r(0) 
-r 


Tt 


< 00 


(2.8) 


with  equality  if  and  only  if  f(6) 


= r(0),  0 € i.e.  , the 


process  is  white  and  equals  its  own  Innovations.  Since  < >«, 
f(0)  can  also  be  expressed  in  factored  form  with 


f'^O)  = a/C(e  ^®) 


C (z)  = 2 C Z 0 

k=0 


z > 1 


'-0 

00 

E (c 
k=0 


= 1 

I 2 


< OO 


with  Oj  as  before.  This  yields  a one-sided  autoregressive  second  order 


10 


model  of  the  form 


\ 


00 

= E 

k=l 


k n-k 


f n 


Comparison  with  the  moving  average  model  shows  that  B and  C are 
Inverse  filters. 

Denote  by  71  the  class  of  spectral  densities  (members  of  S') 
for  which  1/f  e Lj^.  Thus  if  f e TJ  it  is  nondeterministlc  (and 
so  is  1/f)  and  has  both  moving  average  and  autoregressive  models. 

Note  also  that  (Jj/f  = l/c^. 

A crucial  facet  of  the  preceeding  models  is  their  causality. 

Any  power  spectral  density  f € S can  be  modeled  to  second  order  as  the 

1/2 

output  of  the  noncausal  filter  f (positive  square  root)  driven  by 

{Z  ) yielding  a two-sided  moving  average  representation  X = 
n n 

^ ® / + 1/2 
2-1  . b.  Z , . Hence  we  will  refer  to  f as  a causal  model  and  f 
k=-®  k n-k 

1/2 

as  a noncausal  model  for  f.  Note  that  f exists  more  generally  and 
that  if  both  exist,  f^^^  = [ f ^| . 
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Linear  Prediction  (Grenander  and  Szeg'6  [^  ],  Ch.  10,  Doob  [ 9]»  Ch,  12, 
Gray  and  Markel  [23]) 

For  a finite  Integer  m and  a process  {X^},  form  a one-step  linear 
predictor  of  X^  of  the  form 

m 

X = E h,  X , 
n . , k n-k 
k=l 

with  average  squared  error 

2 2 o 

E(e^)  = E((X„-  X ) ) = E((X^-  X ) ) , 

XI  11  II  II  ^ I1*K 

Define  the  monlc  filter  a^^  by  a^  = 1,  X 0 (h^^  = 0 

if  k < 0 or  k > m).  We  have  then 

E(e^)  = E(|Ea  X 1^)  = r f(e)|A(e  ‘®)|^de  , (2.9) 

" k=0  * -JT 

where  A(z)  = 1-H(z)5i  1- E f*  z We  wish  to  find  the  monic  A (and 

k=l  k 

2 2 

hence  H)  that  minimizes  E(e  ) with  the  resulting  squared  error  a . The 

n m 

solution  is  well-known  to  be  given  by  the  A,  say  A,  that  solves  the 
linear  system  of  equations 

2 .-1  ? I',  -e,  " 


a = (2n)~^  r f (e)A(e'‘^)de  = E a r,(k) 

m « _ . k f 


-7t 

n 


k=0 


0 = (2Tt)“  J f(e)A(e'^)e  ^de  = Eaj^r^(k-J), 

k=0 


(2.10) 


-TT 


j = 1,2,. . . ,m 
th 

Note  this  is  easily  proved  by  observing  that  for  any  monic  m 
order  filter  G (2.10)  Implies 

(2it)"^  r f(e)A(e**®)G(e‘^^®)de  = a^+  E g (2rt)"^  J f (e)A(e'*®)e'^‘'^de  = of 
-i  " k=l  in 

(2.11) 


12 


th 

and  hence  for  any  m order  monic  filter  A we  have  from  (2.9) 
that 


16.  1 2 

(2n)  J f(e)|A(e 

) 1 d© 

-TT 

TT 

••  1 A 1 A 

1© 

(2n)  1 f(e)lA(e 

) + A(e 

-rr 

— TT 

A,- 

A(e  ) - 

le. 


lev  1 2^ 


= a 


+ 2£R^{(2jt)"^  J f(e)A{e"^®)(A(e^®)-A(e  ^®))de3 

-rr 

+ (2:t)"^  f f(6)lA(e  - A(e  ^®)  I ^d©  a (2.12) 


with  equality  if  and  only  if  A = A (almost  everywhere).  This  is  simply 
the  orthogonality  principal. 

It  is  well-known  that  the  system  of  equations  (2.10)  is  equivalent 
to  the  system  of  "correlation  matching"  equations  [l0,4] 


r^(k)  = (2jt)"^  Jf(0)e^‘'^dG  = (2r)"^  f fCT^/|A(e  ^*^)|  ^je^'^d©  , (2.13) 

k — 0flf...im 


Ik©  , 


.-1 


-TT 


That  (2.13)  Implies  (2.10)  is  easy,  the  converse  implication  seems  more 

difficult  to  prove.  Note  that  (2.13)  Implies  that  for  any 

m -ik© 


order  filter  G(©)  = £,  g.  e" 

k=0  k 


(2!r)"^  J f(©)|G(©)l^d©  = (2Tr)"^  J d©|G(©)|^a^/  lA(e  ‘®)|^  (2.14) 


we  have 


.-1 


■rr 


2 2 


ievi2 


2 2 

The  minimum  value  a of  E(e  ) is  given  by 

?n  n 


13 


I 


‘^m  " det(T  (f)) 

m 

where  T (f)  = {(2jt)"^  f f (e)e^^’'"'^^®de:  k,  J=0, 1 , . . . .m-l ) = 

in  w 

-n 

{r^(k-J);  k,J=0,l,. . . ,jn-j)  is  the  order  Toeplitz  matrix  of  the 

spectral  density  f.  It  Is  well-known  from  the  theory  of  Toeplitz 
forms  (Grenander  and  Szego  [8  ])  that  c -*  as  m -♦  » and  that 

n I 

2 2 

A similar  argument  to  the  preceedlng  shows  that  If  m = », 

2 2 2 2 

then  = a.  whence  a S:  a.  for  all  m.  In  this  case,  If  f has 
* f m f 

autoregressive  model  a^/  |a|  , then  A = A is  the  best  predictor  and 

2 2 

the  resulting  e^  is  white  with  E e = a-,  that  is,  passing  fX  } 
n n I n 

through  the  filter  A yields  Its  Innovations  process  and  hence  A Is 

+ 1 /2 

called  a whitening  filter  for  f,  (1/f  and  1/f  also  whiten  f). 
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3.  SPECTRAL  RATIO  DISTORTION  MEASURES 

In  this  section  we  consider  beslc  properties  of  distortion  measures 
on  T..  In  the  next  section  several  specific  examples  are  Introduced 
and  compared. 

A distortion  measure  Is  a generalization  of  the  notion  of  a distance 
or  metric.  A distortion  measure  d(*,*)  on  TJ  Is  simply  an  assign- 
ment of  a nonnegative  extended  real  number  to  each  pair  f , g In  TJ. 
Intuitively,  d(f,g)  represents  the  distortion  or  cost  or  "badness  of 
approximation"  of  reproducing  f as  g.  Without  loss  of  generality 
we  can  assume  that  d(f ,f)  = 0 (see,  e.g. , Berger  [ll]). 

Distortion  measures  may  ox’  may  not  have  the  following  properties: 

A distortion  measure  Is  (a)  symmetric  If  d(f,g)  = d(g,f),  all 
' f,g  € 7?;  (b)  finite-valued  If  d(f,g)  < all  f,g  € TJ;  (c)  positive 

definite  If  d(f,g)  = 0 means  f = g (almost  everywhere);  (d)  metric 
(actually,  pseudo-metric)  If  d(f,g)  ^ d(f,h)  + d(h,g),  all  f,g,h  € 7?. 

A distortion  measure  Is  called  a distance  or  a metric  If  It  has  all  of 
these  properties.  Metrics  have  additional  structure  over  general  dis- 
tortion measures,  but  most  basic  theoretical  results  for  distortion 
measures  such  as  Information  theoretic  optimal  performance  bounds  do  not 

• require  (a),  (c)  or  (d).  They  do  require  (b)  (at  least  with  probability 

/ 

i 

i one  since  communication  with  finite  average  distortion  Is  otherwise 

I 

I Impossible).  In  particular,  nonsymmetrlc  distortion  measures  may  not 

i 

be  as  easy  to  work  with,  but  they  have  no  Inherent  mathematical  draw- 
backs to  communications  theory  and  may  In  fact  be  more  appropriate  for 

a 

certain  situations.  The  metric  property,  however.  Is  quite  useful 
since  It  allows  us  to  conclude  that  If  In  a given  communications  system 
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the  reproduction  is  produced  in  two  steps  and  each  step  results  in 
small  distortion,  then  the  overall  distortion  will  also  be  small. 

Some  of  the  distortion  measures  here  will  have  similar  properties  since 
they  are  defined  in  terms  of  norms. 

Given  two  distortion  measures  d and  d_  on  a common  space 
we  shall  be  concerned  with  which  is  "stronger”  or  "weaker.”  We  say 
that  d is  stronger  than  d or  implies  d and  write  d =*  d 

L A A X « 

if  small  enough  distortion  under  d^  implies  that  dg  is  also  small, 

that  is,  given  f e TJ  and  e>  0 there  is  a 5 > 0 such  that  if 

d^(f,g)  s B,  then  dgCf.g)  ^ €•  If  d^  d^  and  d^  =>  d^ , we  say 

that  d and  d are  equivalent  and  write  d d . Intuitively, 

equivalent  distortion  measures  yield  the  same  notions  of  "good"  and 

"bad”  performance  even  though  their  numerical  values  may  differ.  For 

2 

example,  Hf-gljg  (which  is  a metric)  and  ||f-g||2  (which  is  not)  are 
obviously  equivalent  distortion  measures.  Clearly  this  is  actually  an 
equivalence  relation  in  the  sense  that  ^2^  *^3 

d^v=>  dg.  The  intuitive  importance  of  equivalence  lies  in  the  fact  that 
if  a distortion  measure  d is  subjectively  meaningful,  then  so 
are  all  other  distortion  measures  equivalent  to  d since  small  and 
large  values  yield  the  same  notion  of  "good"  and  "bad,”  only  the 
numerical  requirements  of  small  and  large  change. 

In  some  cases  it  is  useful  to  define  distortion  measure  in  terms 
of  other  distortion  measures.  For  example,  given  distortion  measures 
dj(f,g)  and  d2(f,g)  on  one  can  define 

d^‘*\f,g)  = (d^(f,g)‘*  + d2(f,g)‘*)^^‘’  (3.1) 
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where  q ^ 1 is  a parameter.  Sometimes  a scaling  is  also  Included. 


The  special  cases  q = 1 and  2 are  the  most  common,  but  general  q 
are  sometimes  considered  as  most  of  the  properties  remain  true  and  a 
greater  variety  of  distortion  measures  is  thereby  included.  The 
parameter  q can  be  chosen  for  convenience  since,  as  we  now  show, 

r _ \ 

the  distortion  measures  d'^^(f,g)  are  all  equivalent  for  fixed  d^ 
and  d„.  From  Ash  [7],  pp.  83-88,  we  have  that 

A 

a%  b*^  (a+b)''  s 2‘*"^(a%  b'^)  , (3.2) 

a,b  ^ 0 

q ^ 1 . 


and  hence 

d^‘^\f,g)  S d^(f,g)  + dgCf.g)  S 2^"^''%^‘^\f,g)  (3.3) 

and  therefore  d^**\f,g)<=>  d^^\f,g)  for  all  q.  Note  that 
d^**\f,g)  =9  d^(f,g),  i = 1,2,  but  that  it  may  not  be  true,  for  example, 
that  dj^(f,g)  =*  d^^\f,g). 

An  example  of  the  previous  construction  is  to  symmetrize  a non- 
symmetric  distortion  measure.  A nonsymmetric  distortion  measure  can  be 
symmetrized  in  a number  of  ways.  The  most  common  is  to  define 
dj(f,g)  = d(f,g)  and  d2(f,g)  = d(g,f)  and  use  (3.1),  that  is,  to 
define 

d^‘’\f,g)  = (d(f,g)‘*+  d(g,f)‘‘)^^‘*  (3.4) 


Equation  (3.3)  implies  that  the  d^^^  are  equivalent  for  all  q. 

A common  useful  class  of  distortion  measures  are  difference  distor- 
tions having  the  following  form:  d(f,g)  is  a difference  distortion 
measure  on  ^ C if  there  is  a function  (p:(-«»,oi>)  -»  [0,ob)  such  that 
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(p(0)  = 0,  cp(x)  i 0,  and 

d(f,g)  = ||cp(f-g)||p  , (3.5) 

where  f-g  is  well-defined  since  f and  g are  members  of  a linear 
space.  It  is  usually  assumed  that  cp(|x|)  is  nondecreasing  with  | x| 
or,  more  strongly,  that  cp(x)  is  convex  U (for  example,  cp(|x|)  =|x|**, 
q ^ 1) . An  alternate  class  of  distortion  measures  sometimes  referred 
to  as  difference  distortion  measures  reverses  the  roles  of  norm  and  cp 
and  sets 

d'(f,g)  = Cp(lif-gllp)  , (3.6) 

with  cp  having  the  above  properties.  We  shall  call  this  a norm-difference 
distortion  measure. 

Most  distortion  measures  arising  in  speech  applications,  however, 
are  not  difference  measures.  Instead  they  are  ratio  distortion  measures 
having  the  following  form:  Let  cp;[0,oo)  -»  satisfy  cp(l)  = 0, 

(p(x)  j-  0,  and  cp(x)  is  a convex  U function  of  x (with  a minimum 
now  at  1 Instead  of  0) . A distortion  measure  of  the  form 

is  called  a ratio  distortion  measure  on  The  subscript  cp  will  often 

be  replaced  by  a mnemonic.  We  also  consider  gain-normalized  distortion 
measures  of  the  form 

g/Og  / 

where  the  subscript  "n"  is  an  abbrevation  for  "normalized.”  Note  that 
a ratio  distortion  measure  can  also  be  considered  as  a difference 
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distortion  measure  on  the  space  7^  of  all  functions  of  the  form  jCn  f, 
f e 7^,  but  we  work  with  TJ  as  our  basic  space  as  It  contains  the 
basic  structural  properties  of  spectra.  Analagous  to  (3.6)  we  can  also 
have  a norm-ratio  distortion  measure  of  the  form 

= Cp(||f/8||p) 

Note  that  for  p = 1 we  have  from  Jensen's  Inequality  that 


d.'^(f,g)  = cr(||f/g||j)  s Ilcp(f/g)llj^  = d (f,g) 


(3.7) 


and  therefore  for  norms 


(3.8) 


A variation  on  the  ratio  distortion  measure  that  occurs  In  speech 
processing  Is  the  gain-optimized  distortion  measure.  Here  we  begin 

with  a ratio  distortion  measure  d , but  the  dependence  on  the  reproduction 

cp’ 

2 2 
gain  0^  Is  removed  by  replacing  It  with  a gain  o that  minimizes  d^. 

This  Is  usually  done  for  one  of  two  reasons.  First,  we  may  Ignore  the 

original  gain  of  a reproduction  symbol  and  replace  it  by  a gain  chosen  to 

minimize  the  given  distortion  measure.  Second,  by  removing  dependence  of 

the  distortion  measure  d on  a reproduction  parameter  such  as  the  gain 

> 

it  allows  us  the  freedom  of  using  a different  distortion  measure  on  the 
gains.  We  thus  define  a gain-optimized  distortion  measure 

d°(f,g)  = Inf  d (f,o^g/o^) 

tP  9 •.  g 

0 «0 

2 2 

If  the  ^nflmum  is  a minimum,  the  optlmian  o Is  denoted  and  called 

the  optimum  reproduction  gain.  Note  that  we  remove  the  original  repro- 

2 

ductlon  gain  by  normalizing  g and  replace  It  by  the  new  gain  o , 
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Alternatively,  we  might  combine  the  cp  functions  before  taking  the  norm, 
for  example  forming 

cp*(f/g)  = I (cpj(f/g)  + cp2(f/g)) 

(the  1/2  is  for  convenience)  and  then  define 
d (f,g)  = llcp*(f/g)||p 

= III  (cpjCf/g)  + q)2(f/g))|lp 

= I ||cpj(f/g)  + cp2(f/g)||p  (3.12) 

The  measures  of  (3.11)  and  (3.12)  are  related  for  q = 1 by 
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I = |(|icpi(f/g)||p+ll'p2^^/*'^llp^  ^ n'ax(l|i;Pl(f/g)||p.  ||cp2(f/g)llp)  $ 

||<Pj(f/g)-Kp2(f/g)||p  = d*(f,g)  i.  ||cpj^(f/g)||p+||cp2(f/g)||p  = d^^\f,g),  and 
hence  since  d^^^  we  have  that 

d <=>  d''^^  , q = 1,2,...  (3.13) 

that  Is,  both  forms  of  symmetrized  measures  are  equivalent.  In 
particular,  if  one  wishes  to  symmetrize  a nonsymmetric  distortion  measure 
||(p(f/g)ilp,  then 

(|lcp(f/g)|lp  + |lcp(g/f)llp)^'^‘^  <=>  I |lcp(f/g)  + cp(g/f)||p  . (3.14) 

* 

Obviously  other  alternatives  to  (3.12)  exist  for  forming  cp  from 
and  cpg,  e.g.,  cp  (f/g)  = {cpj^(f/g)cp2(f/g) } yields  another  ratio 
distortion  measure.  The  arithmetic  mean  of  (3.12),  however,  seems  the 
m ost  useful . 

Another  form  of  implication  and  equivalence  of  distortion  measures 
is  the  following;  We  say  that  is  stronger  in  a coding  sense  than 

d_  and  write  d D d if  for  each  f,g,  g'  e 7^  we  have  that 
dj^(f,g)  s dj^(f,g')  implies  that  d2(f,g)  s:  d2(f,g''),  i.e.,  if  g‘  is 
a worse  reproduction  of  f than  g is  under  d^,  then  it  is  also 
worse  under  d2.  If  d^^  D d2  and  dj^  C d2,  we  write  d^^  C Z>  dg  and 
say  that  d^  and  d^  are  coding  equivalent.  The  name  and  application 
of  this  concept  arises  in  the  following  coding  or  quantization  problem. 
Consider  f c ??  as  a symbol  in  an  alphabet  TJ  and  let  ^ be  a subset 
of  7^  called  a reproduction  space  or  codebook  (usually  7^  has  a finite 
number  of  members).  Given  a distortion  measure  d^,  define  the 

A A 

minimum  distortion  quantizer  (or  coder)  -♦  71  by 
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f.(f)  = ge^  if  d^(f,g)  fi  cl^(f,g')  , all  g'  e ^ 

A 

with  some  tie-breaking  rule.  Thus  picks  the  closest  or  minimum 

distortion  (under  d.)  reproduction  symbol  to  f.  If  d D d , then 

1 1a 

d^Cf.fgCf))  = d2(f,f^(f))  , 

A 

that  is,  a closest  reproduction  symbol  f^(f)  under  d^  is  also  a 
closest  reproduction  under  dg-  (It  may  or  may  not  be  true  that 
^j^(f)  = f2^^^  depending  on  the  tie  breaking  rule).  If  d^  C D d^, 

A A 

then  the  tie-breaking  rules  can  be  chosen  so  that  f (f)  = f„(f), 
that  is,  coding  equivalent  distortion  measures  result  in  the  same  code, 
Note,  however,  that  the  code  may  be  "good"  under  one  distortion  measure 
yet  "bad"  under  another  in  the  sense  of  average  performance. 


m 


f 


4.  EXAMPLES 

In  this  section  several  examples  of  spectral  ratio  distortion 
measures  Introduced  in  the  speech  literature  along  with  some  other 
related  measures  are  defined  and  motivated.  In  the  next  section  their 
properties  and  interrelations  are  developed.  We  begin  by  listing  the 
various  measures  along  with  comments  on  each.  In  each  case  we  set 
d(f,g)  = “ if  the  given  integral  does  not  exist. 

1 ) The  Itakura-Saito  Distortion  Measure 

djg(f,g)  = ||f/g  - 1 - Jen(f/g)||^  (4.1) 

This  is  a ratio  distortion  measure  with  cp(x)  = x-l-fn  x.  This  distance 
was  introduced  by  Itakura  and  Saito  [12]  and  has  the  property  that  for 
fixed  f and  a class  = [all  g € such  that  g (6 ) = a^/\  £ 
a^  = 1),  then  the  g ^ 7;^  minimizing  d^g(f,g)  is  g(e)  = o^/\A(e 

A 

where  A is  defined  by  (2.10)  and  yields  the  minimum  prediction  error 
th 

over  all  m order  prediction  filters  H = 1-A.  Itakura  and  Saito 
also  showed  that  if  the  underlying  process  was  assumed  Gaussian,  then 
minimizing  ^ approximately  equivalent  to  finding  a 

maximum  likelihood  guess  of  A given  a sample  power  spectral  density  f. 

A related  less  known  property  of  this  measure  under  a Gaussian  assumption 
is  the  following:  If  f and  g are  power  spectral  densities  of  two 
zero  mean  Gaussian  processes,  then  let  Pj(x”)  and  p”(x”),  x”  6 (-“,*)", 
denote  the  resulting  probability  density  functions  and  define  the  n^** 
order  relative  entropy  (or  Kullback-Lelbler  number  or  directed  divergence) 
[13,14,15] 

I^(f,g)  = ^ dx"pj(x")jln(p”(x”)/p”(x"))  . 
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This  quantity  is  used  both  in  detection  and  information  theory. 
It  is  well-known  [13,14,15]  that  for  Gaussian  processes 


1 1 1 n 

In(f .g)  = 2 TMfT  ^ 2 - 2 

n 


(4.2) 


where  "tr"  denotes  the  trace  of  a matrix  and,  as  previously  discussed, 

T (f)  is  the  n^^  order  autocorrelation  matrix  of  f.  From  the  asirmp- 
n 

totic  eigenvalue  theorem  for  Toeplitz  matrices  [ 8,16,17],  the  normal- 
ized directed  divergence  has  limit 


I(f,g)  = lim  n ^1  (f,g) 

n 

n -»  “> 


. I ‘ j 

a-  -Tf 


g(e)  ® 2 


2 


(4.3) 


that  is,  the  Itakura-Saito  distance  between  f and  g is  exactly  half 
the  asymptotic  per  symbol  Kullback-Leibler  number  under  a Gaussian 
assumption. 

We  note  that 


d,3(f.g)  = (2 


,-l  f /fO) 


1 - Xn 


d0 


= (2n)"^  ; 


-n 


f (9) 

g(e) 


2 

'^f 
in  — 

o 

g 


= - 1 - £n  ^ 

a 

g 


(4.4) 


where  the  Integrand  in  the  leftmost  integral  is  nonnegative  from  the 
inequality  in  x ^ x-1. 
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2>  The  Itakura  Distortion  Measure 


3)  Model  Distortion  Measures 


Define  the  causal  model  (or  filter)  distortion  measure 


= l|l-fVg‘^'i 


(4.8) 


and  the  gain-normalized  causal  model  distortion 


1;, 

= I 1 i 


(4.9) 


The  gain-normalized  causal  model  distortion  measure  is  a gain-normalized 
spectral  ratio  distortion  measure  (also  a norm-spectral  measure  since 


d (f|g) 
ncm 


f <e)/a^  ) 

= l-2fR  { (2n)  r de  ' 

M g^6)/a^  I 


, « f(e)/a^  n f(e)/4 

+ (2n)  J J de  = (2n)"  J i dG-l 

-jr  g(e)/a^  -JT  g(e)/CTg 


Vf/g<°>/-f  - ^ 

d-(f.g) 
e - 1 


(4.10) 


where  we  have  used  the  fact  that  the  f^/o^  and  ••'onic  and 

causal  (from  (2.5))  and  hence  from  (2.6)  the  bracketed  term  above  is  1. 


Note  that 


d,(f/a^,  g/a^)  ^ d (f,g) 

It  g ncm 


The  gain-normalized  causal  model  distortion  measure  was  introduced 
by  Itakura  [is]  as  an  approximation  to  the  Itakura  distortion  measure 


for  small  values  since  from  (4.10)  we  have  that  for  small  dj(f,g) 
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ncm 


dj(f,g) 


Regardless  of  approximation,  however,  (4.10)  Implies  (directly  or  from 
the  Jin  X ^ x-1  inequality)  that 


^ d (f,g) 
ncm  I 


The  distortion  measure  d 


ncm 


(4.11) 

has  the  property  of  the  Itakura-Saito 


distance  that  for  fixed  f and  the  class  T.  , a minimum  distortion 

m 

^ 2 ^ 10  2 
g c will  have  the  form  g(0)  = a /|A(e  )j  , where  A is  defined 

2 

by  (2.10),  but  a is  arbitrary. 

Chaffee  [19]  also  used  the  gain-normalized  causal  model  distortion 
measure  in  his  coding  (or  rate-distortion)  approach  to  speech  compression 
where  he  used  the  coding  or  quantization  approach  previously  described 
to  select  a monic  filter  reproduction  and  an  alternate  criterion  to 
select  the  gain. 

The  causal  model  distortion  measure  is  a slight  generalization  of 

d and  is  Introduced  for  comparison  and  interpretation  purposes, 
ncm 


Note  that  analagous  to  (4.10) 


f(0) 


‘ I ,<e) 

— It 


de 


-1  r '""(e) 


«">  j - 


de 


and  hence  d 


cm 


I\*»  /(  / 1 J 

-i  !*<«)  ) 

(4.12) 

can  be  thought  of  as  a gain-biased  spectral  ratio  dls- 


= 1 + *’f/g^°^  " ^V'^g 


tortlon  measure.  We  can  also  consider  a gain-optimized  causal  model  ratio 


measure  which  is  easily  shown  to  be 

2 
g 

jar) 


(f,g)  = 1 - y—T 

cm 


= "g^f/g<°>/°f 


(4.13) 


Another  related  measure  is  the  noncausal  model  distortion  measure 
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(again  introduced  for  comparison) 


d (f ig) 
nm 


1/2,1 2 


|i-|fVg1 


1^ 

'2 


1 + r,  . (0) 
f/g 


2(2k)"^  I de 


-JT  g(e) 


This  is  a spectral  ratio  measure  with  cp(x)  = 1-x 


1/2 


All  the  model  distortion  measures  have  the  following  interpretation; 

Say  spectral  density  f(0)  = |F(9)|  , where  F(0)  is  the 

transfer  function  of  a causal  model  f or  noncausal  model  |f  |. 

2 

Similarly  define  g(0)  = |g(0)|“  and  c nslder  the  system  of  Fig.  la, 


where  1®  ® unit  variance  zero  mean  white  process.  We  have  that 


lil-F/GlI^  = E(Z^-  , (4.14) 

2 

that  is,  Ill-F/Gllg  measures  how  nearly  inverse  filters  F and  1/G 

are  by  measuring  the  average  squared  error  between  a white  input  process 

and  the  cascade  of  F and  1/G.  The  closer  F and  G are,  the  more 

"white"  the  output  of  the  cascade  F/G  looks  since  it  is  close  in  a 

squared-error  sense  to  the  white  input. 

Alternatively,  consider  the  system  of  Fig.  lb.  Here  1/F  is  a 

true  whitening  filter  for  (X  ) and  1/G  is  a "mismatched"  whitening 

n 

filter.  Here  again 


2 

and  hence  |1i-F/G||  is  a measure  of  the  "mismatch"  of  1/G  to  F in 

A 

that  it  measures  the  error  power  between  the  true  whitened  process  and 

the  mismatch  whitened  process.  This  interpretation  of  d^^^  is  used 

2 

by  Gray  and  Markel  [ 2 1 (wherein  d = f\/a  - 1). 

ncm 


4)  The  Spectral  Ratio  Distortion  Measure 

d^(f,g)  = 111  - f/gllj  (4.15) 

This  is  a spectral  ratio  measure  with  cp(x)  = I l-x| . This  measure 
will  provide  some  interesting  comparisons  with  the  model  distortion 
measures.  We  have  that 

d^(f,g)  = (2n)  J I l-f(e)/g(e)|d6 

-7T 
1 ^ 

= (27t)“  j (i+f(e)/g(e)  - 2min(f(e)/g(e),i))de 

-n 

1 ^ 

= 1 + J*  min(f (0)/g(e),l)de  (4.16) 


f! 


The  Lj  ratio  distortion  provides  an  alternative  measure  of 

"mismatch"  to  the  model  distortion  measures.  Recall  in  Fig.  1 that  the 

2 

model  distortion  measure  computed  E(Z  -Y  ) . Alternatively,  we  can 

n n 

measure  mismatch  by 

1 E - E = I r-,  (0)  - l| 

' n n ' f/g'  I 

^ 111  - f/g|ii  . (4.17) 

so  that  dj^  is  an  upperbound  on  the  difference  of  the  output  powers 
as  opposed  to  the  power  of  the  difference. 

All  of  the  preceedlng  measures  are  nonsymmetric.  We  next  consider 
several  symmetric  distortion  measures. 
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I 


J 
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5)  Log  Spectral  Deviation 


dlog(**8>  = lU"  ^/8llp 


|Xn  f - jtngll 


(4.18) 


The  most  common  choices  for  p are  2,1,  or  This  is  a spectral 

ratio  measure  with  <i>(x)  = j/n  x|  and  is  one  of  the  most  commonly 
proposed  distortion  measures  for  speech  [ 5,  2,  6].  For  p = 2 there 
exist  fast  techniques  for  computation  of  using  cepstral  approx- 

imations, but  there  do  not  seem  to  be  fast  algorithms  for  finding  the 

best  fit  (say  in  7?)  to  a given  f.  Note  that  d,  is  metric 
TO  log 

since 

“loe"-'’  = '■log"’*'’  * ■'log <*'•*> 

The  remaining  measures  are  all  symmetrized  versions  of  measures 
1-4.  As  was  shown  in  Section  3,  there  are  several  equivalent  means  of 
symmetrizing  measures  and  hence  we  can  choose  the  simplest  or  most 
useful. 


f)  The  Cosh  distortion  measure;  Symmetrizing  the  Itakura-Saito 


distortion  as  in  (3.8)  we  obtain 


.>.g>  = 

-n(g(e)^'^  f(e)'^^) 

= 5 li(f/g)^^^  - 


(4.19) 


We  also  can  write 


31 


djs<f.g)  = I II ^ 


2 ■*■ 


2 ^IS 


,(1) 


where  d is  defined  by  (3.11),  that  is,  both  types  of  symmetrization 
lo 

yield  effectively  the  same  measure.  This  distortion  measure  was  intro- 
duced by  Gray  and  Markel  [2  ] and  is  called  the  Cosh  distortion 
measure  since 

^cosh^^’^^  " l|cosh(Xn  f/g)  - l])^ 

This  measure  has  some  interesting  interpretations.  First  recall  the 

directed  divergence  discussion  of  Gaussian  processes  and  the  Itakura- 

Saito  distortion.  In  statistics  , detection  theory,  and  information 

theory  [l3,14,15?  one  often  uses  the  symmetrized  directed  divergence 

J = 1 (1, 2 )+I (2 , 1) , where  J is  called  simply  the  divergence.  Defining 

J = I (1,2)+I  (2,1)  we  have  from  (4.2)-(4.3)  that 
n n n 

lim  n-\  = i cijg(f,g)  4 i djg(g,f) 

n -»  “> 

= '’cosh^^'S^ 

that  is,  the  cosh  measure  is  exactly  the  asymptotic  normalized  divergence 
between  processes  having  spectral  density  f and  g under  a Gausian 
assumpt ion. 

A second  interpretation  comes  from  the  theory  of  random  processes. 

Say  we  have  two  stationary  Gaussian  processes  ®**d  (V^)  with 

spectral  densities  f^^  and  f^  respectively.  The  squared-error 
p-dlstance  [20]  (or  Ornsteln  distance)  between  these  processes  is  defined 
by 

p(fj,f2)  = inf  E((U^-Vq)^) 
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where  the  infimum  is  over  all  consistent  Joint  probabilistic  descrip- 
tions of  (U  } and  (V  },  that  is,  over  all  pair  processes  (U  ,V  } 
n n n n 

having  the  original  procestds  as  coordinates.  Intuitively,  given  (U^) 
and  ^ measures  how  well  we  can  "fit"  the  two  processes  together 

In  a squared-error  sense.  For  Gaussian  process  [20] 


and  hence 


‘>cosh<^'«> 


,1/2 


.l/2u2 


^ c(f/g,g/f) 


the  cosh  measure  is  one-hal.*  the  p-dlstance  between  the  "mismatch" 
whitened  process  f/g  and  g/f.  Intuitively,  instead  of  comparing 
f/g  to  one  to  see  how  nearly  inverses  f and  1/g  are,  we  compare 
f/g  to  Its  own  inverse  g/f.  We  note  that  even  If  the  processes  are 
not  Gaussian,  then  cCf^jfg)  ^ t^O]. 


7)  Gain-Optimized  Cosh  Measure 

As  the  Itakura  distortion  was  obtained  from  the  Itakura-Salto 


distance  by  choosing  a reproduction  gain,  the  cosh  measure  can  also  be 
modified  in  a similar  fashion  (as  suggested  by  Gray  and  Markel  [2]). 
Form 


= j{(<Jg/a^)r,/g(0>  + (aVo^)r^/j(0)  - 2) 

and  use  calculus  to 

2 

minimize  this  over  o resulting  In 

2 

(4.20) 

Csh^^'«> 

2 2 

cosh 

= 2(rj/g(0)  - 1) 

(4.21) 
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8)  Symmetrized  Itakura  Distortion 


dj(f,g)  = i (dj(f,g)  + dj(g,f)) 

• 

(4.22) 

Note  that 

"cosh<^'«>  = 2(e  ^ - 1) 

(4.23) 

9) 

Symmetrized  Model  Distortion 

= jlli  - frill  * II'  -fr 

n 

I's 

I 1 

(4.24) 

'■SI"’*'  ■ 

» 

(4.25) 

■‘fl  “•*>  = jll'  - 1',  ‘ 11'  - fs 

q)  i/q 

(4.26) 

10) 

Symmetrized  Lj^  Ratio  Distortion 

d{^\f,g)  = 111  - f/g||j  + ||l  - g/f|| 

1 

(4.27) 

This  has 

another  form.  Using  the  fact  that 

|l-a|  + 1 1-1/a  1 = |a  - l/a|  , 

a ^ 0 , 

(4.28) 

we  have 

dj^^f.g)  = lif/g  - g/ft^ 

9 

(4.29) 

which  is  an  analog  to  the  cosh  measure. 

Many  other  distortion  measures  can  be  defined  by  combining  the 
previous  measures  as  in  Section  3,  but  the  preceeding  are  the  basic 
measures  considered  here. 
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Many  of  the  distortion  measures  considered  have  the  interesting 


property  that  the  distortion  between  f and  g is  bound  below  by  the 
distortion  between  two  white  processes  having  the  same  gains,  that  is, 


the  innovations  processes  of  f and  g.  We  say  that  a distortion 

2 2 

measure  has  the  innovations  property  if  d(f,g)  i dCa^jCg).  Clearly 

this  is  a trivial  property  for  gain-normalized  distortion  measures  since 
2 2 

then  d(c^,CT^)  = d(l,l)  = 0 s d(f,g).  We  have  that  since 


dig(f.g)  ^ 


(5.1) 


_ . 2 2 1 I'^f  ‘^Ci  ^ 

cosh  ^f'^g  ^ “^g  '^f 


(5.2) 


Equation  (5.2)  shows  that  if  d ..(f,g  ) -->  0 as  n -»  <»,  then 

cosh  n 

2 2 

necessarily  -*  a^.  From  (4.12) 

= U - 

2 2 2 
cm  f g 

whence 


d (f,g)  a d (df.cr^ 
cm  cm  I g 


(5.3) 


and 


cm  cm  I g 

We  have 

dlogCf.g)  = Pn  f/gllp  s l|Xn  f/g||^ 

JT  ^ 

= (2n)~^  J |Xn  f(e)/g(e)|de  a | (27r)“^  J In  f(e)/g(e)de| 

-n  -n 

= I in  CTf/Ogl  = ||in  af/CTgllp 

and  therefore 

It  is  not  known  if  the  ratio  or  noncausal  model  distortions  possess 

this  property. 


Nonsymmetric  Distortion  Implications 
We  have  that 


(2n)“^  I 


-It  g(e) 

and  hence  from  (4. 12)-(4. 13) 


-1  r f(e)  ^ 

172  de  s a/Og 


(5.6) 


d (f.g)  s d (f.g) 
cm  nm 


(5.7) 


so  that  d =*  d . We  have  from  (4.4)  and  (4.12)  that 
cm  nm 

2 2 

( 

1 - in 

e g 


IS 


...  .2 

“"f 

” / 

= «l„_(i»8)  - 

cm 

1 — - 

a 

g 

(4Bf  « \ 

/ 


. .2  . , 2 2.2  ..22. 

cm  cm  I g IS  f g 


(5.8) 


and  hence 
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r.4»>' 


^ x2  ^ ,2  2^2 

cm  cm  X g 


so  that  additively  removing  the  innovations  distortion  makes  the 
Itakura-Salto  and  causal  model  distortions  the  same.  Expanding  and 
cancelling  in  (5.8)  yields 

d (f,g)  = d^^(f,g)  + 2 ^ - 1 - Xn  ^ S d ^(f,g)^  (5.9) 

lo  cm  j 0 cm 

S g 

and  hence  d_„(f,g)  =»  d (i,g)  =»  d (f,g).  Next  observe  that  if  x -»  1, 


cm 


nm 


then  x-l-Xn  x -»  0.  Thus  if  d (f  ,g  ) -»  0 as  n -♦  <»,  from  the 

cm  n 

Innovations  property  I l-a./a  I -*  0 whence  (o./a  -l-jin(a./a  ))  -»  0 

^ S IS  Is 

n-^  *^11 

and  therefore  d (f,g  ) -*  0 Implies  d,„(f,g  ) -»  0.  We  have  thus 
cm  n IS  n 

shown  that 


d (f,g)^  d (f,g)  =»  d^^(f,g) 
IS  cm  nm 


(5. 10) 


We  have  from  (4.11)  that  d (f,g)  =*  d (f,g)  and  from  (4.10)  and 

ncm  1 

(4.12)  that 

2 2 

af  Oj 

and  therefore  using  the  Innovations  property  of  d 


d ^ d (f,g)  ^ 


ncm 


cm 


*K> 

cm 


(5.11) 


which  implies 


Recall  from  (4.10)  that 


d =»  d 
cm  ncm 


(5.12) 


2 

d„^„(f,g)  = e ^ - 1 

ncm 
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which  Implies  that 


since 


d => 
ncm  I 


d (f,g)  = Xn(d  (f,g)^+  1)  s d (f,g)^ 

I ncm  ncm  ’ 


and  if  d-(f,g  ) -♦  0 as  n -»  «>,  then 
1 n 

d (f,g  ) = £ — — 

ncm  n , , kl 

k=l 


From  the  inequality 


(5.13) 


(5.14) 


(5. 15) 


X S mln(l ,x)  , X ^ 0 


(5.18) 


we  have  from  (4.13)  and  (4.16)  that 


and  therefore 


(5. 19) 


(5.20) 


Note  that  (5.19)  also  follows  from  the  definitions  and  the  inequality 


1-x  i 1-x ' 


(5.21) 


(which  follows  from  (5.18)). 


The  Implications  for  nonsymmetric  distortion  measures  are  summarized 


below. 


d,  d 
I ncm 


d.c  d 

IS  cm 


d e d, 
nm  1 


(5.22) 
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Note  that  d can  be  written  in  a manner  strongly  resembling  d 
ncm  i 

as  follows: 


d (f|g)  = c^r.  , (0)/a^-l  = 

ncm  g f/g  f 


that  is,  d (f.g)  is  a gain-normalized  version  of  d,(f,g),  i.e. , 
ncm  A 


""  d (f/CT^,g/a^) 

ncm  1 I g 


= d (f/(7^,g/CT^) 

cm  I g 


(5.23) 


This  does  not  imply  directly  that  d,  =>  d since  we  have  been 

1 ncm 

2 2 

unable  to  show  that  dj^  -♦  0 Implies  that  -»  1.  It  can  be  easily 


shown,  however,  that 

lll-f/glll+  = d^(f,g)  + d^(a^,ag)  =»  d^^^ 


(5.24) 


Finally,  if  we  consider  coding  equivalence,  since  dj  and  d^^^ 

are  related  monotonically  by  (4.10),  clearly  d C Dd^  and  from 

ncm  I 


the  stated  properties  d^g  Thus 


u 

c ^ d. 


(5.25) 
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Symmetric  Distortion  Implications 


We  first  focus  on  the  cosh  distortion  and  log  spectral  deviation 
as  these  are  the  most  commonly  proposed  measures  In  the  speech  literature. 
We  then  develop  their  relations  to  the  other  measures.  Gray  and  Markel 
[ 2 ] proved  graphically  that 

||fn  f/g\\l  ^ 2d^osh^^'«^ 

and  hence  that  ^ *^log'  in  their  proof  is  the  following 

Q. 

Taylor  series  expansion  argument:  For  real  x 2 0 set  x = e and 
we  have 

I -/s,  , a/2  -(a/2), 

I X -X  I = I e -e  I 


00  k ® 

z - s 

(-a/2)*‘ 

Ir  * 

= 2 

Z 

(a/2)*' 

tv  * 

k=0  ' k=0 

K» 

k=l,3, 

5 ** 

r Vl  f * • * 

1 

a + 2 Z 

(a/2)^  1 

k*  1 

+ 

tl 

T'  (a/2)*'  1 

k=3 , 5 , . . . 

tv*  1 

1 

lkr:3,5,.,. 

1 

= |fn  x] 

whence 

||£n  f/gllg  ^ ||fW*-  sVf'Yz  = 2d^osh^^'«^ 

The  converse  implication  is  not  in  general  true  as  can  easily  be  seen 
by  counterexample. 

From  (4.12),  (4.19),  and  (5.7)  we  have  that 
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“cosh<''«> 


* Vf<°>  - “ 


d (f,g)^+  d (g,f)^+  2a./CT^+  2a  /a--  4 
cm  cm  IS  S' 


. d«>»,g,=> . d«>(,.g)= 

cm  nm 


(5.27) 


and  therefore  d 


(2)  (2) 

=>  d'  =>  d'  . From  (5.7) 


cosh  cm 


cm 


(5.28) 


We  also  have  that  from  (3.3) 


so  that 


d^^^f.S)  S 2V^\f,g) 


d . (f,s)  ^ d^^\f,g)^ 

cosh  nm 


(5.29) 


To  summarize  we  have  that 

5 ^ 5 ^ "cosh''’'’  = ‘ 


which  proves  that 


(5.30) 


d ^ d^**^  <p>  d^**^  , q=l,2,...  (5.31) 

cosh  cm  nm  ’ ^ ’ 


Analagous  to  (4.10)  and  (5. 13)-(5. 15)  we  have  from  (4.23)  that 

"cch"-*’  “ 


(5.32) 
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We  also  have  the  following:  d and  therefore 

I ncm 

d*  d^^^  , 

I ncm  ' 


(5.33) 


d =*  d , and  therefore 
cm  nm 


(5.34) 


d,  =>  d , and  therefore 
1 nm 


(5.35) 


For  example, 


I 1 j r 

dj  (f,g)  = llf/g  - g/f|lj  S 2d^^g^(f,g)^ 


To  summarize  for  the  symmetric  case 


cosh  cm  nm  1 


(5.30) 


o * (q ) 

d . « d^  =»  d ^ 

cosh  I ncm 


d°  C D d*  C D 

cosh  I ncm 


(5.37) 


(5.38) 


since  all  are  minimized  by  minimizing  r r,  , (0). 

g f/g 

Coupled  with  the  fact  that  all  symmetrized  distortions  imply 
their  unsymmetrized  versions,  this  completes  the  catalogue  of  equiva- 
lence and  implication  relations. 
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SOME  TIME  DOMAIN  EXPRESSIONS 


The  emphasis  here  has  been  on  spectral  representations.  For 
completeness  and  to  ease  comparison  of  the  distortion  measures  defined 
here  with  alternate  forms  appearing  In  the  literature  we  observe  some 
time-domain  expressions. 

Moat  of  the  distortion  measures  Involve  the  term  r.  . (0). 

i/g 

Transforming  this  equation  yields 


r,.(o,  = J 1<|>  ae 


f/g 


g(e) 


= E r (k)r  , (k) 

k=o  ^ 

2 2 

and  hence  If  g = r /|A|  we  have 
g ' 

00 

r./  (0)  = i—  E r (k)r  (k) 

k=0  ^ |a|2 

Alternatively,  If  we  think  of  this  as  the  power  In  the  output  of  a 
filter  l/g*  = A/0^  with  an  Input  process  with  spe-iral  density 

f,  then 


Cf  * 


g 

CD  00 

= \ Ti  T.  a a r (k-J) 

a k=0  1=0  * J ^ 

g 

where  the  sum  will  be  finite  if  A has  finite  order.  This  can  also 
be  expressed  In  matrix  notation 

1 


^f/g'°> 


—K  a ' R a 

^ >w  ^ 

a 

g 


where  a'  = (1  ,a^ .a^ • . . . ) Is  a aeml-lnf Inlte  vector  and  R the  doubly 


r 


seml-inf Inlte  correlation  matrix.  If  A has  finite  order  m this 
reduces  to 


^ (a'")’T  (1)8™ 
f /g  m 

g 


where  (a  )'  = (l.a, ,...,a  . ).  In  addition,  the  theory  of  Toeplltz 

1 m— 1 


forms  [ 8 ,16,17]  can  be  used  to  write 


<?T  (0)  = lim  n ^trT  (f)T  (IaI^) 

g i/g  „ ^ „ n n I ' 


2 2 

Lastly  we  observe  that  if  f = a^/|B|  , then 


d (f.g) 

ncm 


f /Px 


1 - 


gVr 


n » 


I1-A/BII2  = ^ (2nr^  jf  I 2:  (Dj^-dj^)e“‘’^|^f(e)de 


-JT  k=0 


00  00 


z:  E (b  -d  )r.(k-j)(b  -a  ) 

k=0  j=0^j‘  ^ ^ 


2 b r (k-J)b 
k=0  ^ 


= (b-a)'R(b-a)/'b’R  ^ 


the  form  used  by  Itakura  [is]  and  Chaffee  [l9]. 


! i 
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IMPLICATIONS  FC»  SPEECH  COMPRESSION 


In  this  section  some  implications  and  applications  of  the  preceed- 

ing  results  to  speech  compression  are  discussed  and  some  research 

directions  proposed.  The  mathematical  model  adopted  herein  is  the 

following.  We  are  given  a sequence  of  "symbols"  [f  } _ where 

n 

each  ssnnbol  f^  6 is  itself  the  power  spectral  density  of  a station- 
ary and  ergodic  nondete  nninistlc  random  process  as  in  Section  2,  that 

is,  the  alphabet  T,  consists  of  all  nonnegative  even  real  valued 

1 -1  ‘ 
functions  f(9),  0 t for  which  f,f  e (and  hence  i 

In  f i L,)*  In  actual  practice  each  f is  obtained  via  a transforma- 

tion  on  a windowed  speech  waveform  {x(t);t  e [nT,(n+l)T)}  by,  for 

example,  forming  the  magnitude  square  of  the  discrete  Fourier  transform. 

A 

We  are  also  given  a finite  reproduction  alphabet  7^  C A data  com- 

A 

presslon  system  maps  each  f € 7^  into  a reproduction  e 7J  and 

n n 

then  a binary  index  (fixed  or  variable  length)  is  transmitted  to  a 

A 

receiver  who  then  reconstructs  f • The  goal  is  to  minimize  the  average 

n 

A 

distortion  d(f  ,f  ) for  a fixed  code  rate.  We  note  that  the  theory 
n n 

of  source  coding  is  valid  for  such  a general  alphabet  and  distortion 

A 

measure  provided  that  the  source  s***!  reproduction  set  7^  are 

A 

such  that  ~ * cannot  occur  with  nonzero  probability.  This 

is  a physical  assumption  without  which  finite  average  distortion 
communication  is  not  possible. 

A general  data  compression  system  maps  several  input  symbols  into 

several  output  symbols:  ' '^n(N+l)-l^  ^ ^"^nN \(N+1)-1>'  | 

n=0,  +1,  +2,...  a technique  called  block  coding;  or  several  input  j 

A I 

symbols  into  a single  output  symbol:  (f  „,..., f ,..., f f , i 

n-N  n n+N  n 
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n=0,-t-l,. . . , a technique  called  sliding-block  coding  [21,22].  We  here 
consider  the  special  case  N=1  of  "single-symbol  quantization"  of  71, 

A, 

a mapping  of  the  form  f -*  t . LPC  compression  systems  using  the 

n n 

autocorrelation  method  have  this  form  where  a "two-step"  compressor  is 

used.  The  first  step  is  to  transform  f into  another  spectrum  f 

th 

that  is  the  best  m order  autoregressive  model  of  f in  the 

sense  that  f = f(f)  (all  spectral  densities  of  the  form 

Q /|A(e  )|  e V,  where  A(z)  = I'*' 2 } and  "best"  means  f(f) 

minimizes  d (f,f).  This  is  a "system  identification"  step  and  results 
IS 

in  distortion  [23] 


d_(f,f{f))  = £n(aVa!)  (7.1) 

IS  m X 

A A A 

In  the  next  step  t -*  t eT^  ^T,  ^71,  where  7 is  a finite  collection 

m 

th 

of  m order  autoregressive  models.  There  are  several  ways  to 

A 

construct  7,  the  most  common  being  to  transform  the  model  f into  a 
vector  of  reflection  coefficients  and  gain  and  separately  quantize  each 
according  to  some  criterion  [5  , 6 ].  Several  criterion  are  possible 
for  this  real  number  quantization,  but  theory  and  practice  have  shown 
that  moat  sensible  approaches  yield  nearly  equivalent  results  [ 6 ]. 

A 


The  rate  of  such  a system  is  log^lT^I  bits  per  "symbol,"  where  here 
symbol  means  a windowed  speech  waveform  of  typically  20ms. 

Such  a system  is  ad  hoc  and  nonoptlmal  since,  for  example,  "optimal" 

A 

usually  means  minimizing  distortion  for  a fixed  yet  here  one 

distortion  measure  (d  ) is  used  in  the  first  step  and  another  (often 

1 O 

a magnitude  error  on  reflection  coefficients  which  is  approximately 
equal  to  ^^og  spectra)  in  the  second.  An  optimal  compressor 


»Vl»  ■ 
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according  to  d for  example,  would  take  f and  directly  find  the 
IS  n 

A ^ A A 

f € ?;  minimizing  d (f ,f)  to  form  f (f)  (no  other  algorithm  can 

Is  n 

yield  a lower  Itakura-Salto  distortion  overall).  An  alternative 

conceptually  optimum  system  would  be  to  use  a two-step  compressor  as 

before,  with  the  second  step  forming  an  optimal  quantizer  of  the 

f e ??  , that  is,  form  f(f  ) as  before  and  then  set  f(f)  £ T.  as 

m n 

A 

the  model  minimizing  d (f,f).  With  this  system,  however,  an  immediate 

1 s 

problem  arises.  Since  d „ is  not  metric,  how  do  we  know  that  a good 

IS 

Job  in  each  step  (d  (f  ,f(f  ))  and  d (f(f  ),  f(f(f  )))  small) 
isnn  isn  n 

will  yield  a good  Job  overall  (d  (f  ,f(f(f  )))  small)?  This  leads 
us  to  one  of  the  Interesting  properties  of  the  Itakura-Saito  distortion 
— it  does  have  a triangle  inequality  (with  equality,  actually)  in  two- 
step  systems  with  the  first  step  as  above  and  the  two  LPC  systems 


described  yield  the  same  encoding.  To  see  this  let 


,(m) 


n 

(m), 


minimize  d^^Cf,!)  over  ^ ^ hence  as  previously  f (0)  = 


IS' 


2 ^ 1©  2 ^ A A 

o^/|A(e  )|  where  A satisfies  (2.10)),  let  f e ^ and 

use  (2.14)  and  (7.1)  to  write 

d (f,f)  = (2n)"^  J de-l-£naj/af 

^ -TT  f(e)  f 

n .(m) 


= j ^ 


^ de-l-jfcn  ffj/of 
f(e)  f 


(2n)"^  J dG-l-fn  + tn(a^/ah 

-i  f(e)  ^ f ^ ^ 

djg(f,f^”‘^  + djg(f^'"\f) 


(7.2) 


Thus  for  this  type  of  two  step  system,  regardless  of  the  quantization 
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rule,  the  overall  Itakura-Saito  distortion  is  exactly  the  sum  of  the 

incurred  distortion  in  the  separate  system-identification  and  compression 

Cm  ) 

steps.  In  particular,  if  the  compression  of  f is  done  optimally 

A 

for  a fixed  set  7J,  then  it  is  equivalent  to  an  optimum  quantizer 
operating  directly  on  fl  It  would  be  of  interest  to  compute  d for 
real  overall  systems  since  this  would  yield  a distortion  measure  consist- 
ent with  the  Implicit  definition  of  optimum  in  the  first  step.  It 
would  also  be  of  Interest  to  see  which  of  the  various  reflection  coeffi- 
cient quantization  rules  yield  the  smallest  quantization  (and  hence 

overall)  d . We  conjecture  that  as  in  [6  ],  the  results  would  be 
I s 

very  nearly  the  same  since  little  improvement  is  possible  when  one  is 
constrained  to  separately  quantize  each  real  parameter.  It  would  also 
be  Interest  to  see  if  computationally  efficient  approximately  optimal 

A 

(in  d,„)  mappings  T -*  could  be  developed. 

IS  ra 

Next  suppose  that  subjective  testing  might  indicate  that  some  other 

distortion  measure  is  better  than  d . This  alternative  distortion 

1 o 

could  be  used  in  either  of  the  two  kinds  of  systems  — the  two-step 
or  direct  quantization.  In  a two-step  system  the  use  of  an  alternative 
and  symmetric  measure  such  as  the  cosh  or  log  spectral  deviation  would 


result  in  finding  a model  that  matched  zeroes  as  well  as  poles  and  hence 

T might  be  replaced  by  a collection  of  mixed  moving-average  and 
m 

autoregressive  (ARMA)  models.  Unfortunately,  however,  finding  a best 
finite-order  model  matching  poles  and  zeroes  (or  even  zeroes  alone) 
seems  a very  difficult  problem.  This  points  out  that  one  of  the  nice 
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alternative  distortion  measure  is  suggested  by  subjective  testing,  but 

scatter  plots  such  as  those  of  Markel  and  Gray  [ 2 ] or  Matsuyama  L24 ] 

suggest  that  for  small  distortion  the  alternative  measure  is  highly 

correlated  with  d , then  use  d as  an  approximation  in  Step  1 

IS  IS 

to  facilitate  computation  and  then  use  the  other  measure  in  Step  2. 

2 

Performance  can  be  slightly  Improved  by  replacing  r in  Step  1 by 

m 

A 

the  gain-optimized  value  for  A according  to  the  other  distortion 
2 

measure,  e.g. , for  the  cosh  measure. 

If  a two  step  system  is  constructed  using  the  causal  filter  measure, 

then  the  behavior  is  similar  to  that  of  the  Itakura-Salto  system  and  one 

2 2 

again  obtains  a triangle  inequality.  For  g = a /|  A|  € 7L  have  that 


2.2 


‘'cm^^'°  /|A|  > = 


,(0)-  2o,/o 


flAI' 


which  is  minimized  by  minimizing  r choosing  A to  satisfy 

fhl  2 

(2.10)  and  by  choosing  the  optimum  gain  (see  (4.13))  o = 0^/°^  ^ 

Thus,  the  monlc  filter  is  the  same  as  that  of  the  Itakura-Saito  distance, 
but  the  gain  is  larger.  Denote  the  resulting  spectrum  f(6)  = a/|A(e  ®)| 
where 

~ 2, 

° = 

m I 

d (1,1)^  = min  d (f,g)^ 

cm  ’ _ cm 

* 1 - (o./o  )^ 

I m 

Let  f = denote  the  quantized  version  of  ? resulting  from  the 

second  step  and  we  have  from  (2.14)  that 
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1 


^ 2 ^2 
s d (f,f)  + d (f,f) 

cm  ’ cm  ’ ’ 

a sort  of  triangle  inequality. 

Another  observation  on  these  systems  is  that  implicit  (and 

relevant)  subjective  testing  can  be  accomplished  for  the  various 

distortion  measures  by  simulating  either  type  of  compression  system 

for  a given  distortion  (likely  using  d in  Step  1 of  the  two-step 

X o 

for  simplicity)  and  then  listening  to  the  reconstructed  output.  A 
reproduction  set  could  be  taken  as  the  LPC  system  reproduction  set  or, 
say,  the  direct  quantization  reproduction  set  of  Chaf..  e [ 19\  who  uses 

At 

d to  select  the  monic  filter  part  of  f and  alternative  criteria 
ncm 

to  select  the  gain  and  pitch.  The  point  is,  a good  subjective  test 
of  a distortion  measure  is  to  listen  to  the  output  of  a minimum  dis- 
tortion compressor  using  that  measure.  We  are  currently  attempting  to 

A 

study  various  rules  for  selecting  the  finite  reproduction  set  from 

A 

observed  data  and  for  efficiently  computing  the  various  d(f,f)  and 

A 

finding  the  best  f.  If  such  systems  are  successful  (in  particular, 
if  they  are  comparable  to  LPC  systems  as  Chaffee's  [l9]  work  suggests), 
our  feeling  is  they  will  provide  an  alternative  to  LPC  requiring  less 
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computation  but  more  memory.  Success  in  this  approach  would  also  open 

two  other  avenues  of  future  research:  (1)  Compression  systems  using 

block  or  sliding-block  coding  on  the  (f  ] could  be  attempted.  This 

n 

may  sound  prohibitively  complex,  but  if  the  single  symbol  systems  were 
well-understood,  then  the  "Fake  Process"  approach  to  data  compression 
[25,26]  would  provide  a straightforward  ad  hoc  technique  for  Improving 
performance  using  the  memory  in  the  **  large  (high  rate) 

A 

finite  class  TJ  could  be  shown  to  be  "rich"  enough  to  well-model 

most  {f  },  then  the  long  run  probabilistic  behavior  could  be  approxi- 
n 

A 

mated  by  compiling  first-order  histograms  of  occurrences  of  f in 
real  speech.  The  probabilistic  model  could  be  coupled  with  the  dis- 
tortion measure  and  the  Blahut  algorithm  [ 2?]  to  compute  a meaningful 
distortion  rate  function  for  speech  and  thereby  obtain  an  absolute 
unbeatable  bound  on  performance  of  single-symbol  direct  quantizers  (such 
as  the  LPC  and  Chaffee  systems).  It  would  be  interesting  to  know  how 
nearly  "optimal"  the  ad  hoc  but  highly  successful  UPC  systems  are  and 
whether  or  not  one  must  resort  to  block  or  sliding-block  coding  in 
order  to  obtain  real  improvement  over  LPC  systems.  If  efficient  means 
of  finding  conditional  histograms  could  be  found,  higher  order  distor- 
tion rate  functions  could  be  obtained  yielding  performance  bounds  on 
more  general  system. 

Another  problem  has  to  be  addressed  in  simulating  such  systems, 

A 

that  of  decoding.  How  does  one  convert  a spectral  density  f back 
into  a sound?  Mathematically,  there  exists  a random  process  having  such 
a density.  In  fact  there  exist  many  (Gaussian,  for  example),  and  it  is 
important  to  know  which  to  use.  Again  mathematically,  it  makes  no 
difference  insofar  as  the  distortion  measures  herein  considered  are 
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concerned  since  these  all  place  zero  distortion  on  two  identical  spectra. 
Practically,  however,  these  distortion  measures  simply  approximate  the 
biological  distortion  measure  of  the  human  brain  and  the  actual  process 
used  in  reconstruction  will  very  likely  make  a difference  in  subjective 
quality.  Here  we  can  only  propose  to  try  ad  hoc  techniques  when  this 
stage  is  reached.  A first  try  would  be  to  rise  simply  Gaussian  noise 
driving  the  factored  spectrum  to  produce  a Gaussian  process  with  the 
correct  (optimal)  spectral  density.  Thlj  does  not  mean  that  speech 
"looks"  Gaussian,  only  that  Gaussian  pseudo-speech  may  sound  like  speech. 
Alternatively,  some  work  [28,29]  indicates  a double-sided  exponentially 
distributed  white  process  may  perform  more  satisfactorily.  This  is  a 
problem  which  must  be  treated  experimentally  with  human  listeners  as 
the  distortion  measures  cannot  tell  the  difference  of  underlying 
statistics  except  through  the  spectra. 

This  report  was  motivated  by  the  research  described  in  this  section. 
It  was  found  useful  to  have  a catalogue  of  the  various  distortion 
measures,  properties,  and  interrelations.  As  the  experiments  proposed 
here  will  likely  involve  considerable  time  to  reach  any  solid  conclusions, 
the  preliminary  work  on  the  distortion  measures  has  been  collected  now 
in  the  hopes  of  being  useful  to  others  conducting  similar  research. 
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